The present disclosure relates generally to the field of information extraction, and more particularly to generating and maintaining a custodian directory by extracting identity information from documents.
In many circumstances, it is important to be able to retrieve all emails and documents associated with a particular custodian. For example, in a lawsuit involving a corporation, emails and documents written by, or directed to, specific individuals within the corporation are often requested during the discovery phase. Because people often change email addresses over time, or they have multiple email addresses (e.g., a work email address and a personal email address) or user IDs that they use to conduct work, it can be extremely difficult and time consuming to retrieve all of the requested emails and documents associated with the specific individuals. It can also be particularly difficult when the request is for all documents sent to an employee of the company by a specific person who works for a different company.